Automated Test: monitor-incident-refactor-after #311

admin-coderabbit · 2026-02-04T17:59:59Z

This pull request was automatically created by @coderabbitai/e2e-reviewer.

Batch created pull request.

Summary by CodeRabbit

New Features
- Improved incident occurrence tracking for monitor failures with detailed failure metadata, environment context, and human-readable failure summaries.
- Enhanced failure reason generation with pluralization support and contextual information for better incident visibility.
Refactor
- Streamlined monitor failure handling by consolidating internal orchestration logic into a unified incident threshold evaluation process, improving code maintainability.

…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527

coderabbit-eval · 2026-02-04T18:00:53Z

📝 Walkthrough

Walkthrough

The changes introduce incident occurrence publishing for monitor failures. A new module creates and emits incident occurrences with failure metadata and evidence, while a new function evaluates monitor environments for incident state transitions based on failure thresholds. The existing mark_failed function is refactored to delegate to this new incident threshold logic.

Changes

Cohort / File(s)	Summary
Incident Occurrence Publishing `src/sentry/monitors/logic/incident_occurrence.py`	New module introducing `create_incident_occurrence()` to construct and emit IssueOccurrence objects to Kafka with failure metadata, evidence_display (reason, environment, last check-in), and contexts. Includes helper functions `get_failure_reason()` for human-readable failure summaries and `get_monitor_environment_context()` for context dictionaries, plus mappings for localized failure status strings.
Incident Threshold Logic `src/sentry/monitors/logic/incidents.py`	New function `try_incident_threshold()` to evaluate monitor environments for incident state transitions based on failure thresholds. Handles environment status evaluation, check-in sequence retrieval, incident creation/retrieval, and delegates to `create_incident_occurrence()` for each failure when appropriate. Emits monitor_environment_failed signal.
Mark Failed Refactoring `src/sentry/monitors/logic/mark_failed.py`	Simplified to delegate incident orchestration to `try_incident_threshold()`, removing ~260 lines of internal threshold computation, incident management, and occurrence publishing logic. Function now returns boolean result from threshold evaluation.
Type Definitions `src/sentry/monitors/types.py`	Added new TypedDict `SimpleCheckIn` with fields for id, date_added, and status to represent minimal check-in objects.

Sequence Diagram

sequenceDiagram
    participant Check as Failed Check-in
    participant MarkFailed as mark_failed()
    participant Threshold as try_incident_threshold()
    participant Store as Monitor Environment<br/>(Database)
    participant Incident as Incident<br/>(Database)
    participant Occurrence as create_incident_occurrence()
    participant Kafka as Kafka

    Check->>MarkFailed: failed_checkin, ts, received
    MarkFailed->>Threshold: failed_checkin, failure_issue_threshold, received
    
    Threshold->>Store: Get monitor_environment
    alt monitor_env exists and status is OK/ACTIVE
        Threshold->>Store: Fetch previous check-ins up to threshold
        Threshold->>Store: Set status to ERROR, save
        Threshold->>Incident: get_or_create incident
        Incident-->>Threshold: incident instance
    else status is ERROR
        Threshold->>Incident: Retrieve active incident
    end
    
    alt Incident exists and not muted
        Threshold->>Occurrence: create_incident_occurrence(failed_checkins, checkin, incident, received)
        Occurrence->>Kafka: produce_occurrence_to_kafka(IssueOccurrence)
        Kafka-->>Occurrence: ✓ Published
    end
    
    Threshold-->>MarkFailed: boolean result
    MarkFailed-->>Check: True/False

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 The monitors now speak with care,
Each failure captured in the air,
Through Kafka's stream, occurrences flow,
Incidents tracked from high to low,
A refactored dance, both lean and bright! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title refers to a test automation process but lacks specificity about actual code changes—does not clearly describe the refactoring of monitor incident logic.	Revise title to describe the primary change, such as 'Refactor monitor incident handling into dedicated modules' or 'Extract incident occurrence logic into separate module'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch monitor-incident-refactor-after

_{Comment @coderabbit eval help to get the list of available commands and usage tips.}

coderabbit-eval

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/sentry/monitors/logic/incident_occurrence.py`:
- Around line 159-171: In get_monitor_environment_context(MonitorEnvironment),
the function copies monitor.config to local variable config and mutates
schedule_type but then returns monitor_environment.monitor.config (the
original); fix by returning the modified local config (or assign the modified
copy back) in the returned dict under the "config" key so the schedule_type
display change is preserved.
- Around line 130-156: Fix three issues in get_failure_reason: correct the
docstring typo to "human readable"; change the membership check when building
status_counts to "if checkin['status'] in HUMAN_FAILURE_STATUS_MAP" (remove
.keys()); and replace list(status_counts.keys())[0] with
next(iter(status_counts.keys())) (or next(iter(status_counts))) when selecting
the single status for SINGULAR_HUMAN_FAILURE_MAP to avoid creating a full list.

🧹 Nitpick comments (1)

src/sentry/monitors/logic/incidents.py (1)
36-53: Unnecessary list materialization inside any().

Line 52 uses any([...]) with a list comprehension. A generator expression (any(... for ...)) short-circuits without allocating the full list.
Proposed fix
-            if any([checkin["status"] == CheckInStatus.OK for checkin in previous_checkins]):
+            if any(checkin["status"] == CheckInStatus.OK for checkin in previous_checkins):

coderabbit-eval · 2026-02-04T18:08:46Z

src/sentry/monitors/logic/incident_occurrence.py

+def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]):
+    """
+    Builds a humam readible string from a list of failed check-ins.
+
+    "3 missed check-ins detected"
+    "2 missed check-ins, 1 timeout check-in and 1 error check-in were detected"
+    "A failed check-in was detected"
+    """
+
+    status_counts = Counter(
+        checkin["status"]
+        for checkin in failed_checkins
+        if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys()
+    )
+
+    if sum(status_counts.values()) == 1:
+        return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]
+
+    human_status = get_text_list(
+        [
+            "%(count)d %(status)s" % {"count": count, "status": HUMAN_FAILURE_STATUS_MAP[status]}
+            for status, count in status_counts.items()
+        ],
+        last_word=_("and"),
+    )
+
+    return _("%(problem_checkins)s check-ins detected") % {"problem_checkins": human_status}


⚠️ Potential issue | 🟡 Minor

Minor issues: typo in docstring and static analysis findings.

Line 132: Typo — "humam readible" → "human readable".

Line 142: Idiomatic Python — in HUMAN_FAILURE_STATUS_MAP suffices (.keys() is implicit).

Line 146: Per Ruff RUF015, prefer next(iter(status_counts.keys())) over list(status_counts.keys())[0] to avoid materializing the full list for a single element.

Proposed fixes

- """ - Builds a humam readible string from a list of failed check-ins. + """ + Builds a human readable string from a list of failed check-ins.

- if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys() + if checkin["status"] in HUMAN_FAILURE_STATUS_MAP

- return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]] + return SINGULAR_HUMAN_FAILURE_MAP[next(iter(status_counts))]

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 146-146: Prefer next(iter(status_counts.keys())) over single element slice

Replace with next(iter(status_counts.keys()))

(RUF015)

🤖 Prompt for AI Agents

In `@src/sentry/monitors/logic/incident_occurrence.py` around lines 130 - 156, Fix three issues in get_failure_reason: correct the docstring typo to "human readable"; change the membership check when building status_counts to "if checkin['status'] in HUMAN_FAILURE_STATUS_MAP" (remove .keys()); and replace list(status_counts.keys())[0] with next(iter(status_counts.keys())) (or next(iter(status_counts))) when selecting the single status for SINGULAR_HUMAN_FAILURE_MAP to avoid creating a full list.

coderabbit-eval · 2026-02-04T18:08:46Z

src/sentry/monitors/logic/incident_occurrence.py

+def get_monitor_environment_context(monitor_environment: MonitorEnvironment):
+    config = monitor_environment.monitor.config.copy()
+    if "schedule_type" in config:
+        config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display()
+
+    return {
+        "id": str(monitor_environment.monitor.guid),
+        "slug": str(monitor_environment.monitor.slug),
+        "name": monitor_environment.monitor.name,
+        "config": monitor_environment.monitor.config,
+        "status": monitor_environment.get_status_display(),
+        "type": monitor_environment.monitor.get_type_display(),
+    }


⚠️ Potential issue | 🟠 Major

Bug: modified config copy is never used — original config is returned instead.

Line 160 creates a local copy of config and line 162 mutates schedule_type on that copy, but line 168 returns monitor_environment.monitor.config (the original, unmodified dict). The schedule_type display transformation is silently discarded.

🐛 Proposed fix

return { "id": str(monitor_environment.monitor.guid), "slug": str(monitor_environment.monitor.slug), "name": monitor_environment.monitor.name, - "config": monitor_environment.monitor.config, + "config": config, "status": monitor_environment.get_status_display(), "type": monitor_environment.monitor.get_type_display(), }

🤖 Prompt for AI Agents

In `@src/sentry/monitors/logic/incident_occurrence.py` around lines 159 - 171, In get_monitor_environment_context(MonitorEnvironment), the function copies monitor.config to local variable config and mutates schedule_type but then returns monitor_environment.monitor.config (the original); fix by returning the modified local config (or assign the modified copy back) in the returned dict under the "config" key so the schedule_type display change is preserved.

ref(crons): Reorganize incident creation / issue occurrence logic (#8…

c3b32a7

…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527

coderabbit-eval bot closed this Feb 4, 2026

coderabbit-eval bot reviewed Feb 4, 2026

View reviewed changes

Automated Test: monitor-incident-refactor-after #311

Automated Test: monitor-incident-refactor-after #311

admin-coderabbit commented Feb 4, 2026 •

edited by coderabbit-eval bot

Loading

coderabbit-eval bot commented Feb 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

coderabbit-eval bot left a comment

coderabbit-eval bot Feb 4, 2026

coderabbit-eval bot Feb 4, 2026

Automated Test: monitor-incident-refactor-after #311

Automated Test: monitor-incident-refactor-after #311

Conversation

admin-coderabbit commented Feb 4, 2026 • edited by coderabbit-eval bot Loading

Summary by CodeRabbit

coderabbit-eval bot commented Feb 4, 2026 • edited Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

coderabbit-eval bot left a comment

Choose a reason for hiding this comment

coderabbit-eval bot Feb 4, 2026

Choose a reason for hiding this comment

coderabbit-eval bot Feb 4, 2026

Choose a reason for hiding this comment

admin-coderabbit commented Feb 4, 2026 •

edited by coderabbit-eval bot

Loading

coderabbit-eval bot commented Feb 4, 2026 •

edited

Loading